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Abstract. In this paper we consider the problem of estimating a parameter of a probability 
distribution when we have some prior information on a nuisance parameter. We start by the very 
simple case where we know perfectly the value of the nuisance parameter. The complete likelihood 
is the classical tool in this case. Then, progressively, we consider the case where we are given 
a prior probability distribution on this nuisance parameter. The marginal likelihood is then the 
classical tool in this case. Then, we consider the case where we only have a fixed number of its 
moments. Here, we may use the maximum entropy (ME) principle to assign a prior law and thus 
go back to the previous case. Finally, we consider the case where we know only its median. In our 
knowledge, there is not any classical tool for this case. We propose then a new tool for this case 
based on a recently proposed alternative distribution to the marginal probability distribution. This 
new criterion is obtained by first remarking that the marginal distribution can be considered as the 
mean value of the original distribution over the prior probability law of the nuisance parameter, 
and then, by using the median in place of the mean. In this paper, we first summarize the classical 
tools used for the three first cases, then we give the precise definition of this new criterion and its 
properties and, finally, present a few examples to show the differences of these cases. 
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INTRODUCTION 



We consider the problem of estimating a parameter of interest 6 of a probability dis- 
tribution when we have some prior information on a nuisance parameter v from only 
one or a finite number of samples from this probability distribution. Assume that we 
know the expression of either the cumulative distribution function (cdf) Fx\v,e(x\u,9) 
or equivalently the expression of its probability density function (pdf) fx\v,o(%W, 0). We 
assume that v is a nuisance parameter on which we have an a priori information. This 
prior information can either be complete knowledge of its value u or more and more 
incomplete such as a prior distribution Fy(z/) (or a pdf fv(v)) or only the knowledge of a 
finite number of its moments or still just the knowledge of its median. For the three first 
cases there are classical solutions, but in our knowledge, there is not yet any solution for 



this last case. The main object of this paper is to propose a solution for it. This solution 
is based on a recently proposed inference tool which is obtained using the median in 
place of the mean when using a prior distribution on the nuisance parameter [1, 2, 3, 4]. 

This paper is then organized as follows. First, we give a brief presentation of the three 
well known approaches. Then, we summarize the recently proposed inference tool and 
we will see how we can use it for the last problem. 

CLASSICAL APPROACHES OF PARAMETER ESTIMATION 

Assume that we are given an observation x and assume that its cumulative distribu- 
tion function (cdf) F x \v,e(x\u,9) (or equivalently its probability density function (pdf) 
fx\v,e( x \ u i 0)) depends on two parameters v and 9. We assume that 9 is the parameter of 
interest and v is a nuisance parameter. We are looking for tools to infer 9 from one ob- 
servation x and some prior knowledge on v. We are then going to consider the following 
cases: 

Perfect knowledge of u, i.e., v = u : 

Then, the classical approach is the Maximum Likelihood (ML) estimate 

9 M l = argmax{Zo(6>) = fx\u ih e( x \ u o,0)}- (1) 

If we also have a prior /e (#) on the parameter of interest 6, then we can use the Bayesian 
approach by computing the a posteriori distribution fe\x,v o (0\x : u o ) and then use any 
estimator such as the Maximum a posteriori (MAP) estimate 

9 M ap = argm.ax{f e \ XtV0 (9\x,v )} = argmax{/ (#) / e (0)} (2) 

or the Bayesian Mean Square Estimate (MSE) 

e MSE ^ m = fe klx , M ae = (3) 

Incomplete knowledge of v through an apriori cdf F v (u) or pdf /v(i/): 

The classical approach here is the Marginal Maximum Likelihood (MML) estimate 

Omml = axgmax{h(0) = fx\e(x\0)} (4) 

where 

fx\e(x\9) = J f xlVt e(x\v,9)f v (is) du. (5) 

Again here, if we also have a prior fe{9) we can define the a posteriori distribution 

/e|x(0|aO and 

9 M map = argmax{/ |x(0|z)} = argmax-{7i(6>)/e(6>)} (6) 

6 ^ - 1 



or 



Ommse = E{0} = J 9f e \x(0\x) d9 



f6h(6)f e (6) d6 
fh(0)f e (e)M • 



(7) 



Incomplete knowledge of v through the knowledge of a finite number of its mo- 
ments: 

Assume now that our prior knowledge on the parameter v is expressed through the 
knowledge of a finite number of the moments: 



where <f) k are known functions. Particular cases are <f>k{v) = v k where {dk, k — 1, • • • , K} 
are then the moments up to order K of V. 

Here, we can use the principle of Maximum Entropy (ME) to assign a prior probability 
law fe(9) which is the classical tool for assigning a probability law to a quantity when 
we know only a finite number of its moments. The solution is well known and is given 



k k=l ) 

where the Lagrange parameters {\ k , k = 0,---,K} are the solution of the following 
system of equations: 



where we used <f>o(v) = 1 and d — 1 to include the normalization factor A . For more 
details on ME and also on the computational aspects of the Lagrange parameters refer 
to [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]. 

From this point, i.e., when we obtain an expression for fv(v) which translates our 
prior knowledge of the moments on the nuisance parameter, the problem becomes 
equivalent to the previous case. 

Incomplete knowledge of v through the only knowledge of its median (New alter- 
native criterion): 

Assume now that our prior knowledge on the parameter v is expressed through the 
knowledge of its median value. Up to our knowledge, we do not have any classical 
tool such as ME of the previous case to translate this knowledge into a probability law 
fe(0). The main contribution of this paper is exactly to provide a coherent solution to 
this case which is detailed in the next section. 



E{0 fe (V)} = 4, k = l,--,K 



(8) 



by 




(9) 




(10) 



NEW INFERENCE TOOL 



Recently, we proposed a new alternative to the classical approach for this case which 
consists in proposing an alternative criterion f x \g(x\9) (or equivalently F x \e{x\9)) to 
the likelihood function f x \e(x\9) (or equivalently F x \o(x\9)) which we called likelihood 
based on the median which can be used in place of f x \e(x\9) in the previous case. 

The name likelihood based on the median for f x \e(x\9) is motivated by the fact that 
f x \e(x\9) in 

fx\e(x\e) = J f xlv , e (x\is,9)f v (is)dv = E v {f xlv , e (x\V,9)}, (11) 

or equivalently F x \e{x\9) in 

F x]e (x\9) = J F x]v>e (x\u,9)f v (u) du = E v {F x]v ^x\V,9)} , (12) 

can be recognized as the mean value of f x \v,e(x\V,9) (or F x \ Vy e(x\V,9)) over the 
probability law fv{v). 

The proposed new criterion is then defined as the median value of f x \v,e{x\V,9) (or 
F x \v,e(x\V,9)) over the probability law fv(u): 

F xle (x\9) : p(F X \ Vt e(x\V,e)<F X \e(x\9)) = l/2 

In previous works, we showed that, under some mild conditions on F x \v,e{x\v^6) 
the function F x \q(x\9) (strictly increasing) has all the properties of a cdf and thus the 
function l\{9) = f x \e(x\9) has all the properties of a likelihood function. Thus, we can 
use it in place of h(9), i.e.: 

9 M lm = argmax{[i(6>) = f x \e{x\9)} (13) 
or if we also have a prior fe(9) 

9mapm = argmax{/e| X (6>|:r)} = argmax {h(9) fe{9)} (14) 

S MSEM = E{e } = f of elxWl ) ie = (is) 

J J h{9) Je{9) d9 

Indeed, we showed that the expression of F x \e(x\9) is given by 

F xle (x\9) = L^ 1 (^ (16) 

where L(v) = F x \ V fi(x\v, 9). Thus to obtain the expression of F x \ e (x\9) we only need 
to know the median value Fy 1 (|) of the distribution fv(v)- 



-(" + !) 



In what follows, we are considering the four aforementioned cases, i.e. i) the perfect 
knowledge v = v , ii) the knowledge of fv(v), iii) the knowledge of the mean value V 
of V and iv) the knowledge of the median v of V, and examine them through a simple 
but difficult case where we have only one observation x of X with the pdf fx\v,e{ x \v, ®) 
and where we want to estimate 9 with the aforementioned knowledge on the nuisance 
parameter v. 



EXAMPLES 

In what follows, we use the following notations and expressions: 

Gaussian: Af(x; n,a 2 ) = (27ro- 2 )~5exp | — ^ (x — /i) 2 } 

Exponential: £(x; A) = Aexp{— x/\} 

Double Exponential: V£ (x; A) = |exp { — |x|/A} 

Gamma: Q(x;a,(3) = f^-x a ~ 1 exp{— fix} 

Inverse Gamma: IQ(x\a ) f3) = ^ya; a+1 exp{— f3/x} 

Student: S(x; = r«SggU (} + ^)" 

Cauchy: C{x; n,6) = 4tt(i + ^)~ 

Example 1 

The first example we consider is 

fx\vMvf)=M& M) = (2^)^exp|-l( a :-z/) 2 } 

where we assume that the mean value v is the nuisance parameter. Then: 

• Complete knowledge case v — u : 
Then we have 

fx\u A x Wo,0) =Af(x; u ,9) = (27r#)^exp j-^(rr-z/ ) 2 j 
and the ML estimate of 9 is obtained by 

6 = argmax{fx\ vo ,e(x\i> ,e)} = argmin = ^ \n9 + ^(x - z/ ) 2 j 



which gives 9 — (x — u ) 2 . 

Prior pdf case fv(v) = A/" [y\ u ,9q): 
Then we have 

fx\e{x\9) = J f x \v,e{x\v,9) f v (u) dv 




and it is not difficult to show that fx\e{x\9) = Af(x; is ,9 + 9 ) and the MML 
estimate of 9 is obtained by 



which gives 9 = max((i — is ) 2 — 9 ,0). 
• Moments knowledge case E{V} = v Q : 
Then, we need also to know the support S of v to be able to use ME and assign 
f v (u). lfS = R, the ME pdf does not exist, but if S = R + , the ME pdf f v (u) is 
an exponential S (u; u ). In this case, we cannot obtain an analytical expression for 
fx\e{x\9) 



However, the MML estimate can be computed numerically. 

We may also note that, if we are given E{|V|} = u , then even for the case S = R 
the ME pdf exists and is given by VE (u; z/ ). In this case we have 



We cannot obtain an analytical expression for f x \e(x\9), but again the MML esti- 
mate can be computed numerically. 

Finally, if we are given E{V} = u and E{(V — vq) 2 } = 9 , then the ME pdf is the 
Gaussian Af (u; u , 9 ) and we can go back to the case of previous item. 
• Median knowledge case Median {V} = u : 
Then, as we could see, we have f x \o(x\9) =Af(x; u , 9) and we can estimate 9 by 







which gives 9 = (x — u ) 2 . 



Example 2 



The second example we consider is 




where, this time, we assume that v is the variance and the nuisance parameter. Then: 

• Complete knowledge case v = u : 

The ML estimate of 9 is obtained by 9 = argmax j/xi^^ko^)} which gives 

9 = x. 

• Prior pdf case f v (u) =ZQ (u; a/2 J/2): 
Then, 



fx\v,o{x\vf) = lN(x- 6,v)ig(v; a/2 J/2) dv 

1 , Q . 2 \ fl/2^ 



= J (2nu)-^pl [ -^-(x-9)^ dv 

= S (x; 9,a//3,a) 

and we can estimate 9 by 9 = argmax{/ X |0(:r|0)} which gives 9 — x. 
Moments knowledge case E{V} = u : 

Then, knowing that the variance is a positive quantity (S = R + ), the ME pdf fv(v) 
is an exponential S (u; u ). In this case we have 

fx\e(x\9) = jAf(x; 9,u)£(u; u ) du 

= J (27rz/) _ ^exp|-^-(a;-6') 2 | z/ exp{-z//z/ } du 
= S(x; 9,0,l)=C(x; 9,1) 

and 9 = x. 

Median knowledge case Median {V} = u : 

Then, as we could see, we have fx\e(x\9) =Af(x; 9, u ) and we can estimate 9 by 
9 = argmax j/x|6»(^|#)} which gives 9 = x. 

We may note that, all the estimations of the mean 9 when the nuisance parameter v 
is the variance do not depend on the knowledge of this variance. The reason is that 
all the likelihood based estimators of a position parameter are scale invariant. 



CONCLUSIONS 

In this paper we considered the problem of estimating one of the two parameters of a 
probability distribution when the other one is considered as a nuisance parameter on 
which we may have some prior information. We then considered and compared four 
cases: 

i) the complete knowledge case where the nuisance parameter is known exactly. This is 
the simplest case and the classical likelihood based methods apply. 

ii) the incomplete knowledge case where our prior knowledge is translated through a 
prior probability distribution. In this case, we can integrate out the nuisance parameter 



and obtain a marginal likelihood and use it for estimating the parameter of interest. 

iii) the incomplete knowledge case where our prior knowledge is given to us in the form 
of a finite number of its moments. In this case, we can use the ME principle to translate 
our prior knowledge into a prior pdf and find the situation of the previous case. 

iv) the incomplete knowledge case where our prior knowledge is only the median value 
of the nuisance parameter. For this case, up to the knowledge of the authors, there is not 
any classical approach and based on our previous works, we presented a new inference 
tool which can handle this case. 

Finally, to illustrate these cases, we presented a few examples to show the similarities 
and differences of these cases. 
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